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Abstract 

We view the folding of RNA-sequences as a map that assigns a pattern of base pairings 
to each sequence, known as secondary structure. These preimages can be constructed 
as random graphs (i.e. the neutral networks associated to the structure s). 

By interpreting the secondary structure as biological information we can formulate the 
so called Error Threshold of Shapes as an extension of Eigen's et al. concept of an error 
threshold in the single peak landscape [5]. Analogue to the approach of Derrida & 
Peliti [3] for a flat landscape we investigate the spatial distribution of the population 
on the neutral network. 

On the one hand this model of a single shape landscape allows the derivation of ana- 
lytical results, on the other hand the concept gives rise to study various scenarios by 
means of simulations, e.g. the interaction of two different networks [29]. It turns out 
that the intersection of two sets of compatible sequences (with respect to the pair of 
secondary structures) plays a key role in the search for "fitter" secondary structures. 



1. Introduction 



The first theory of biological evolution was presented last century by Charles 
Darwin (1859) in his famous book The Origin of Species. It is based on two fun- 
damental principles, natural selection and erroneous reproduction i.e. mutation. 
The first principle leads to the concept survival of the fittest and the second one 
to diversity, where fitness is an inherited characteristic property of a species and 
can basically be identified with their reproduction rate. 

Au contraire to Darwin's theory of evolution the role of stochastic processes has 
been stated. Wright [31, 32] saw an important role for genetic drift in evolution 
in improving the "evolutionary search capacity" of the whole population. He saw 
genetic drift merely as a process that could improve evolutionary search whereas 
Kimura proposed that the majority of changes in evolution at the molecular 
level were the results of random drift of genotypes [18, 19]. The neutral theory 
of Kimura does not assume that selection plays no role but denies that any 
appreciable fraction of molecular change is caused by selective forces. Over 
the last few decades however there has been a shift of emphasis in the study of 
evolution. Instead of focusing on the differences in the selective value of mutants 
and on population genetics, interest has moved to evolution though natural 
selection as an optimization algorithm on complex fitness landscapes. However, 
for a short moment let us return to Darwin and his minimal requirements for 
adaption: 

• a population of object that are capable of replication, 

• a sufficiently large number of variance of those objects, 

• occasional variations which are inheritable, and 

• restricted proliferation which is constrained by limited resources. 

In this paper we restrict ourselves to RNA, the possibly simplest entities that 
do actually fulfill all the four requirements listed above. We realize the funda- 
mental dichotomy of genotypic legislative by RNA and the phenotypic executive 
is manifested by RNA secondary structures. In this context the mapping from 
RNA secjuences to secondary structures is of central importance, since fitness is 
evaluated on the level of structures. This mapping induces naturally a landscape 
on the RNA sequences independent of any possible evaluation of RNA structures 
[27]. Following the approach in [23] we can construct these sequence structure 
maps by random graphs. By omitting any empirical parameter of RNA-melting 
experiments we obtain the so called neutral networks of sequences which each fold 
into one single structure. It can be shown that these neutral networks and the 
transitions between them are "essential" structural elements in the RNA-folding 
landscape [24]. These landscapes combine both in the first view contradicting 
approaches on biological evolution; Darwins survival of the fittest and Kimuras 
neutral random drift. 



2. Realistic Landscapes 



2.1. Fitness Landscapes and the Molecular Quasispecies 

In this contribution we consider the most simple example of Darwinian evo- 
lution, namely a population V of haploid individuals competing for a common 
resource. 

Following the work of Eigen [4, 5] we consider a population of RNA sequences of 
fixed length n in a stirred flow reactor whose total RNA population fluctuates 
around a constant capacity N . The definition of the overall replication rate of 
a sequence together with the constrained population size specifies our selection 
criterion. 

In the limit of infinite populations its evolution is described by the quasispecies 
equation 



where denotes the concentration of genotype the replication rate 

of this genotype, and Q is the matrix of mutation probabilities, Q^y being the 
probability for a parent of type y to have an off-spring of type x. The replication 
rates considered as a function of the genotypes x form a fitness landscape^ 
[31] over the sequence space [6]. The total population is kept constant by a 
flux $ compensating the production of new offsprings. The model mimics the 
asynchronous serial transfer technique [20]. 

As in the laboratory our RNA populations are tiny compared to the size of the 
sequence space. This fact forces a description in terms of stochastic chemical 
reaction kinetics. Two methods are appropriate to model stochastic processes: 

• Gillespie [14] has described an algorithm for simulating the complete mas- 
ter equation of the chemical reaction network. We have used the implemen- 
tation by Fontana et al. [10] for afl computer simulations reported here. An 
individual sequence Ik can undergo two stochastic reaction events: either 
Ik is removed by the dilution flow, or it replicates with an average rate 
Ak that corresponds to the reaction rate constant in cqu. (l). When an 
individual sequence is replicated, each base is copied with fidelity q. The 
overall model mimics the asynchronous serial transfer technique [20]. 

• Whfle giving an complete description of the dynamics Gfllespies algorithm 
does not allow for a detailed mathematical analysis. Therefore we approx- 
imate the quasispecies model by a birth-death process, following the lines 
of Nowak &: Schuster [21] and Derrida & Peliti [3]. All analytical results 
presented in this contribution are based on this approach. 



For a recent review on fitness landscapes see, e.g., the contribution by Schuster and Stadler 
in the proceedings volume to the Telluride meeting 1993 [25] 
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In general all rate and equilibrium constants of the replication process and hence 
also the over-all rate of RNA synthesis are functions of the 3D-structure. 
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Fig. 1. Mapping of a genotype into its functional representation. Ttie process is partitioned 
into two pliases: Tlie first pfiase is tlie complex mapping ip of sequences into secondary 
structures (plienotypes). Here neutrality plays a crucial role; in the second phase we 
omit the building of the spatial 3D-structure and evaluating its function. We assign 
arbitrarily a fitness-value to each phenotype by the mapping r\. 



This suggests to decompose the computation of the fitness into two steps: First 
we construct the shape of the RNA (phenotype) from its sequence (genotype), 
and then we consider the evaluation of this phenotype by its environment (fig- 
ure 1). The effect of this composition is that we are left with two hopefully 
simpler problems, namely (1) to model the relation between sequences and struc- 
tures in the special case of RNA, and (2) to devise a sensible model for the eval- 
uation of there structures. The combinatory map of RNA secondary structures, 
i.e., the map assigning a shape '-pix) to each sequence in the sequence space C 
will be discussed in the next section. 

Formally, we consider the evaluation rj assigning a numerical fitness value to each 
shape in the shape space S. As even less is known in general about structure- 
function relations than about sequence-structure relations we will use the most 
simple model for the evaluation rj. We assign arbitrary fitness- values ??(si) to 
specially chosen shapes Si and a fitness-value to the background ri(/3) (i.e. the 
remaining shapes) with the condition ri{si) > rj{(3) for all i. 

Tying things together we are considering a fitness landscape of the form 

f{x)=r^{^{x)). (2) 



2.2. The Combinatory Map of RNA Secondary Structures 

Having defined the evaluation rj of the structures we now turn to the sequence- 
structure relation ip. The phenotype of an RNA sequence is modeled by its 
minimum free energy (MFE) secondary structure. 

The evidence compiled in a list of references [8, 9, 11, 15, 24, 26] shows that the 
combinatory map of RNA secondary structures has the following basic proper- 
ties: 



(1) Sequences folding into one and the same structure arc distributed randomly 
in the set of "compatible sequences" , which will be discussed below in detail. 

(2) The frequency distribution of structures is sharply peaked (there are com- 
paratively few common structures and many rare ones). Nevertheless, the 
number of different frequent structvires increases exponentially with the 
chain length. 

(3) Sequences folding into all common structures are found within (relatively) 
small neighborhoods of any random sequence. 

(4) The shape space contains extended "neutral networks" joining sequences 
with identical structures. "Neutral paths" percolate the set of compatible 
sequences. 

(5) There is a large fraction of neutrality, that is, a substantial fraction of 
all mutations leave the secondary structure completely unchanged (see fig- 
ure 2). 

These features are robust. 
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Fig. 2. Frequency of neutral mutations (\u and Ap resp. — see section 2.3), counted sep- 
arately for single base exchanges in unpaired regions (open symbols) and base pair 
exchanges (full symbols) for different alphabets. 



A sequence x is said to be compatible to a secondary structure s if the nucleotides 
Xi and Xj at sequence positions i and j can pair whenever is a base pair in 
■s. Note that this condition does by no means imply that and Xj will actually 
form a base pair in the structure (p{x) obtained by some folding algorithm. The 
set of all sequences compatible with a secondary structure s will be denoted by 
C[s]. There are two types of neighbors to sequence x G C[s]: each mutation 



in a position k which is unpaired in the secondary structure s leads again to a 
sequence compatible with s, while point mutations in the paired regions of s will 
in general produce sequences that are not compatible with s. This problem can 
be overcome by modifying the notion of neighborhood. If we allow the exchange 
base pairs instead of single nucleotides in the paired regions of s we always end 
up with sequences compatible with s. This definition of neighborhood allows us 
to view a; £ C[s] as a graph. It can be shown [23] that this graph is the direct 
product of two generalized hypercubes 

C[s] = QZ" X Q;" (3) 

where Uu is the number of unpaired positions in s, a is the number of different 
nucleotides, i.e., a = 4 in the case of natural RNAs, Up is the number of base 
pairs in s, and j3 is the number of different types of base pairs that can be formed 
by the a different nucleotides; for natural RNAs we have /3 = 6. The sequence 
length is n = Uu + 2np. 



2.3. A Random Graph Construction 

Folding RNA sequences into their secondary structures is computationally quite 
expensive. It is desirable, therefore, to construct a simple random model for 
the sequence structure map ip with the same five properties that have been 
observed for RNA. Rcidys et al. [23] have investigated random subgraphs of the 
hypercubes with the result that their approach is in fact able to explain the 
known facts about the combinatory map of RNA secondary structures. 

2.3.1. A Mathematical Concept 

We consider two closely related models. Consider a hypercube Q^. We con- 
struct a random subgraph F'^ by selecting each edge of Q" independently with 
probability A. From F'^ we obtain the induced subgraph F^ = Q"[F';;^] by adding 
all edges between neighboring sequences that have not been assigned already 
by the random process.^ The probability A is simply the (average) fraction of 
neutral neighbors. 

The main result about these random subgraph models is that there is a critical 
value A* such that the subgraph Fa is dense in and connected (i.e., for 
any two vertices Fa there is path in Fa that connects them) whenever A > A*. 
Explicitly it has been shown [23] that 

A* = 1- (4) 

Density and connectivity of the neutral network F result in percolating neutral 
paths. 



Alternatively, one could draw vertices from and consider corresponding the induced sub- 
graph. Both random subgraph models have essentially the same properties. 



2.3.2. Modeling Generic Fitness Landscapes 



The model formulated above does not take into account that there arc in general 
different probabilities for the two classes neutral mutations, 7^ Xp for the 
unpaired and paired parts of the secondary structure, respectively. Using that 
the "graph of the compatible sequences" is a direct product of two liypercubes 
this limitation can be overcome by considering the direct product of two random 
graphs, one in each of the two hypercubes: 




This model inherits its properties from the basic random subgraph model on the 
two hypercubes. In particular F = Fa,, x Fa^, is dense and connected if both 
"components" Fa„ and Faj, are dense and connected. From now on we will only 
refer to this model for deducing our results in this paper. 

A neutral network induces in a natural way a fitness landscape /r on the complete 
sequence space Q^: 

with (7 > 1. We call /r a single shape landscape in contrast to the single 
peak landscapes discussed for instance in [4, 5]. The two degenerated cases 

are referred to the single peak landscape (F 
consists of a single sequence) and the flat landscape resp. In the following we 
will exploit the analogy between single peak and single shape landscapes quite 
extensively. 

Summarizing the above discussion we claim that a single shape landscape is a 
much more realistic approximation of real fitness landscapes than a single-peak 
landscape or a spin glass hke model landscape, since all these approaches lack 
what we think is the most important feature of biomolecular landscapes: a high 
degree of neutrality. 

In chapter 5 we present a canonical generalization of the single-shape landscape 
to the more realistic multi-shape landscape. Transitions between two neutral 
networks are studied. 



2.4. The Birth and Death Process Model 



Let us now return to the dynamic behavior of a population V on such a land- 
scape. Obviously /r induces a bipartition of the population V into the subpop- 
ulation on the network F and the remaining part V,, of inferior individuals. 
We call the elements of masters (because they have superior fitness) and 
those of Vi, nonmasters. 

We will describe the evolution of V in in terms of a birth- death model [17] 
with constant population size. At each step two individuals are chosen randomly; 
the first choice is subject to error-prone replication while the second choice is 
removed from the population [21]. The stochastic process is specified by the 
following probabilities: 

Wj^^^ is the probability that the ofi^spring of a master is again a master; 
Wjj, is the probability that the offspring of a master is a non-master; 
W^^^ is the probability that the offspring of a non-master is a master; and 
W^^, is the probability that the offspring of a non-master is again a non-master. 

In general these probabilities will depend on the details of the surrounding of 
each particular sequence, namely on the number of neutral neighbors. It is 
possible, however, to show [23] that the fraction of neutral neighbors obeys 
a Gaussian distribution which approaches a (5-distribution in the limes of long 
chains. The same behavior was found numerically for RNA secondary structures. 
Hence we can assume that the number of neutral neighbors is the same for all 
masters, namely nA„ and nXp for the two classes of neighbors. Consequently 
the probabilities Wj^, ^, W^^, and W]^ ^ are independent of the particular 
sequence. 

We consider each rephcation-deletion event as one single event per time-step. 
The consequence of this assumption is that depending on the individual fitness 
the equidistant time-step At in reactor-time results in different time-intervals per 
replication-round At in physical time i . I.e. master replicate cr-times faster than 
nonmaster yielding in cr-times more individuals per replicated master than per 
nonmaster per physical time step At. This difference between physical time t and 
population-dependent reactor time t has to be taken into account by calculating 
the probabilities for the replication-deletion events. 

Analogously to the mutation-probabilities we setup four probabilities P: 
P^^ij, is the probability for choosing a master for replication and deletion; 
P^^i, is the probability for choosing a master for replication and a nonmaster 
for deletion; 

P^^^ is the probability for choosing a nonmaster for replication and a master 
for deletion; 

Pu,i, is the probability for choosing a nonmaster for replication and deletion. 

For the so called birth- and rfeoi/i-probabilities we obtain 'Pk,k+i = Pti,iy^^,fi + 
Pu,vWl^^ and Vk,k-i = Pn,nWl^u + P^,t^Wl^ resp. 



After somo longthy calculations [22] wc are able to compute the stationary dis- 
tribution lip of the birth-death process. According to [7, 17] |ip is given by 
A*p(^) = ^v{^) I ^k''^vi^) Then the stationary distribution is completely deter- 
mined by 
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where B{x,y) is the Beta-function. A^ and Ci {i = 1,2) are defined as follows: 
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For the dynamics in physical times holds 
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3. Diffusion on "Neutral" Landscapes 



In general, "diffusion" can be understood as movement of the barycenter of the 
population in the high-dimensional sequence-space via point-mutations. The 
barycenter M (t) of a population at time < is a real valued consensus vector spec- 
ifying the fraction Xia{t) of each nucleotide a G {A,U,G,C} at every position i. 



3.1. Diffusion in Sequence Space 

Let us assume again that a secondary structure s G iS„ and its corresponding 
neutral network T are fixed. In this section we study the spatial distribution of the 
strings on the network i.e. the spatial distribution of V^. Here we understand 
spatial distribution as distribution in Hamming distances. For this purpose 
we introduce the random variable dp that monitors the pair distances in the 
population V. The shape of the distribution of dp is basically determined by the 
following factors. 

• the distribution of the random variable whose states are the number of 
offspring. 



• the structure of the neutral network F, given by the basic parameters for 
the construction of the random graph, {A„, Ap, fip}. 

• the single digit error rate p for the replication-deletion process. 

We will assume in the sequel that | P,,, | = E[Xp] , in other words the number of 
strings located on the neutral network is constant. Taking into consideration the 
genealogies along the lines of Derrida & Peliti [3] we can express the probability 
of having different ancestors in all i previous generations: 

« e-v[^l '/(J^I^"] , (8) 

where Z describes the number of offspring produced by a master-string viewed 
as a random variable (E[.] and V[.] denote expectation value and variance 
resp.). 

Following Reidys et al. [22] we consider then random walks on the neutral 

network F. For this purpose we introduce the probability (pr (t, h) of traveling a 
Hamming distance ft on F by a random walk lasting t generations. 

In this section we restrict ourselves to alphabets consisting of complementary 
bases that admit only complementary base pairs (consider for example {G, C} 
or {G,C,X, Kj). We consider moves as point-events, i.e. each move occurs at 
precisely one time step At. By use of the regularity assumption, we obtain the 
infinitesimal error rates (for unpaired and paired digits), A„p Ai and \pP^ At. 

Arbitrarily we set 

Pu{t) isl^(l-e-^^"^*)andpp(i) ^ ^ (1 - e'^''-^^ . (9) 



Combining the information on the genealogies and the random walks allows us 
to compute the distribution of dp and leads to the main result in this section. 
For an alphabet consisting of complementary bases with pair alphabet B we have 

/.CO 

B(np,Pp(2[E[Xp] -l]r),M e-^I^l^dr, 

where pu((2[E[Xp] — 1] r), pp((2[E[Xp] — 1]t),/Ip) are defined above. 

Next we turn to the average distance between the populations ■p(i) and V{t'), 
where t >t' are arbitrary times. Then we mean by 

d\,t{V{t),V{t')) --ir^ E ^(^'^') 
avdist(P(i), Ai) =(dist(P(i'),^(i' + Ai)))t- . 



average Distance 
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Fig. 3. The average pair distance E [dp] of master fraction of ttie population 7^ on ttie neutral 
network F in the long time limes. We assume that A = A„ = Ap. The distance is 
plotted as function of the single digit error rate p and the fraction of neutral neighbors 
for the paired and unpaired digits, A. We observe that for wide parameter ranges the 
average pair distance of "P^ is plateau-like. In particular the average pair distance 
becomes at the shape-error threshold. 



where ( . )t' denotes the time average. For binary alphabets with complementary 
base pairs it is shown in [22] that in the limes of infinite chain length 



avdist[P^(i), Ai] 
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(12) 



(see figure 3). Now we study the displacement of the bary center of the population 
Vf^. For this purpose it is convenient to write the complementary digits Vi of the 
sequence x = {xi, Xn) as —1 and 1 respectively. We write x-x' = Xi x\. 

The barycenter of the fraction of masters CV where IV^] = E[Xp] , denoted 
by M^(<), is 



We can compute the resulting diffusion- coefficient D of the barycenter M^^(t) in 
the long time limes for a popvilation V replicating on a neutral network F with 

constant master fraction (implying a constant mean fitness a = '^^ ) . 

Explicitly the diffusion coefficient is given by 



l-{[M{t + At)-M{t)f), 

2 XuTluP 



Xu + V[Z] 
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(14) 



and 

^.{[Mit + At) - Mit)]\ = o + M) - M{t)f), , 

where Xu = 4A„p(E[i'p] - 1) and Xv =^Kp^ (E[i'p] - 1). 



3.2. Mutational Buffering 

We can now compare the analytical distributions of dp with our simulations done 
in the case of binary alphabets (see figure 4). 
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Fig. 4. The distribution of c?p in comparison to rornputer simulations that base on the 
Gillespie algorithm [14]. The simulation data are an time average for 300 generations. 
The solid lines denote the analytical values, the histograms show the numerical data. 



The difference between the experimental and theoretical density curves is due to 
an effect known as buffering [16]. In the neutral networks a population is located 
preferably at vertices with higher degrees i.e. 

V G v[r] : (5„ > A„ n„ + Ap . 

For binary alphabets in particular the expected distance of pairs with 
5^,5^1 > XuUu + XpUp is n/2, since the distance sequence of the Boolean hy- 
percube is given by (^) . Therefore we observe a shift to higher pair distances in 
the population as the theory predicts for regular neutral networks. 



4. Phenotypic Error Threshold 



4.1. Genotypic Error Threshold 

We must distinguish between an error threshold with regard to the genotype 
(sequence) population and a different error threshold, at higher error rates, with 
regard to the induced phenotype (structure) population marking the beginning 
of drift in structure space. That is when the population can no more preserve 
the phenotypic information and optimization breaks down. In the present case 
this occurred^ at p « 0.1 versus a sequence error threshold at approximately 
p = 0. What happens in between is, as it turns out, Kimuras neutral scenario 
[19] in a haploid asexuahy reproducing population. 



4.2. Phenotypic Error Thresholds on the "Single Shape" Landscape 

In this section we investigate the stationary distribution of the numbers of strings 
that are located on the neutral network F (contained in a sequence space of fixed 
chain length n). 

We shall discuss the following two extreme cases. On the one hand we can assume 
that the population size is infinite and on the other hand that A'^ <^ \ \ ■ In 
the first case, since n is assumed to be fixed, the concentrations of masters is 
nonzero for all error probabilities p. 

Next let us consider the case N <^ \ Q2 \ = a"^ i.e the population size is small 
compared to the number of all sequences. Since for any RNA secondary struc- 
tures holds Up = 0{n) and = 0{n) we observe (for sufficiently large n) 

I r I 

^ 1- We now propose 



to be the phenotypic error-threshold for a population of N strings replicating on 
a neutral network F. is further the error threshold of the secondary structure 
s. We immediately inspect that the above mentioned criterion generalizes the 
one used in the case of infinite population size in the single peak landscape of 
Eigen et al. [4], where is the solution of c^{p*) = l/a". 

Let us discuss now the case of infinite population size. In this situation we can 
apply a completely deterministic ansatz solving a (well-known) rate equation 



depending on the fraction of neutral neighbors and relative superiority between individuals 
of different phenotype; a detailed study can be found i.e. in [12] 




(15) 



Fig. 5. For a regular neutral network T with parameters = 0.5 and Xp = 0.5 we plot the 
distribution of Xp i.e. the number of masters of V. 



for the corresponding concentrations of master and nonmaster vertices c^, 
respectively. Assuming « 0, i.e. neglecting back-flow mutations [4] and 

« 0, we derive 



a - 1 



(16) 



Using the threshold criterion of equ. (l5) we can localize the error thresholds nu- 
merically for some population sizes and different Neutral-Network-landscapes.^ 
with (T = 10 as superiority. The deterministic threshold values are obtained by 
solving fs 1/(7 (equ. (16)) for p (table 1). 



Table 1: Theoretical and numerical Error Thresholds (for a = 10) 
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0.27 


0.5 


0.081 
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0.095 


0.095 
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0.8 


0.118 


0.116 


0.11 



Finally we end this section by plotting the densities of the «-th incompatible 

^The calculations are done with Mathematica [30]. 



classes Ci[s] (see figure 6) of the population obtained from our simulations^. 
We observe that at the error threshold there is a sharp transition from a popu- 
lation that is localized on the neutral network to a population that is uniformly 
distributed in sequence space. 
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Fig. 6. In this figure we plot ttie error-classes in incompatible distances Ci[s] for certain 
Neutral-Network-landscapes. The underlying population size for the Gillespie simu- 
lation is = 1000. The error-class * denotes the number of masters i.e. the number 
of strings that are localized on the neutral network. 



5. Transitions between Neutral Networks 



Each neutral network is contained in the set of compatible sequences i.e. the 
set of sequences that could fold into one particular structure s. Each two sets 
of compatible sequences with respect to the pair of secondary structures have 
a nonempty intersection. This fact and the mathematical modeling of neutral 
networks as random graphs imply that the upper bound for the Hamming dis- 
tance between two neutral networks is four. It turns out that the intersection is 
of particular relevance for transitions of finite populations of erroneously repli- 
cating strings between neutral neutral networks. In other words the intersection 
plays a key role in the search for "fitter" secondary structures. 

It has been proven in [23] that the intersection is always nonempty. The intersec- 
tion is constructed explicitly by using an algebraic representation of secondary 



In difference to the ansatz of constant population size, (the basic assumption for the birth- 
death model), the simulations are obtained by use of the Gillespie algorithm [14]. 



structures. As already proposed in [23] each secondary structure s can be inter- 
preted as an element in 5„ by use of the mapping 

rip(s) 

tt: S^Sr,, 7r(s) ^ JJ {xi,x'^. 

1=1 

Here {xi,x'j) is a base pair in s and np{s) is the number of pairs in s. Clearly 
7r(s) is an involution, i.e. Tr(s)Tr{s) = id. 

Using the fact that any two involutions t, i' form a dihedral group Dm = {i-^i') 
for secondary structures s and s' this leads to the mapping 

r.SxS^{Dm<Sn], j{s,s') ^(7r(s),7r(s')). 

In fact the operation of (7r(s), 7r(s')) and especially the corresponding cycle de- 
composition is closely related to the structure of the intersection I[s, s']. 

An arbitrary cycle z is given by a sequence of positions where predecessor and 
successor are determined by the pairs in s and s' . We distinguish between two 
types of cycles: open and closed ones. 



5.1. Size of the Intersection 

This knowledge enables us to determine the size of the intersection. For alpha- 
bets IC with complementary base pairs, e.g. A = {G,C,X,K} with correspond- 
ing base pairings B = {GC,CG, X K, KX}, and a = \A\ we obtain 

where rii is the number of cycles of length i. If we consider the physical alphabet 
A = {G, C, A, U} with corresponding pair alphabet B = {GC, CG, AU, UA, 
GU, UG} we obtain 

i=l 

where is the number of closed cycles with length i and n° is the number 

of open cycles of length i and aif^ and ai"^ are the numbers of all possible 
configurations for an open cycle of length or a closed cycle of length v with 



5.2. structure of the Intersection 



Definition 1. Let s and s' be two secondary structures. The graph I[s, s'] has 
the vertex set I[s,s']. Two sequences x,y € I[s,s'] are neighbors, e.g. {x,y} G 
e[I[s,s']], if and only if x,y are neighbors in C[s] and C[s'] 

That means the intersection graph can be directly embedded in the graph struc- 
ture of the sets of compatible sequences [23]. Note that the common unpaired 
positions in s and s' are the elements in the cycles of length 1. The common 
pairs of s and s' are represented by the closed cycles of length 2. Thus there 
exist two scenarios 

(1) There are only open cycles of length 1 and closed cycles of length 2, then 
I[s, s'] is a connected graph. 

(2) There is at least one cycle of length greater than or equal 3 or one open 
cycle of length 2, then I[s, s'] decomposes into components of equal size 
((al"')"! • (a'^"^)^^). The components are connected by paths in C[s] and 
C[s']. 
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Fig. 7. In the first picture tile distribution of Hamming distance is plotted for elements on 

C[s] to the intersection I[s, S^] (solid line) and to I[s, sJj] (dotted line) resp. The 

second and the third picture show the Gillespie simulations, assuming that there is 

a same high fitness for both neutral networks and a low fitness elsewhere. Obviously 

the population uses the intersection to move from one network to the other. 



5.3. Numerical Results 



Suppose there are given two pairs of structures {s,s[) and {3,82). We assume 
all A values to be equal and an action probability of 1/2 on the intersection 
(figure 7). 



The numerical results confirm the basic assumption of the neutral theory of 
Motoo Kimura [19]. The fixation of phenotypes is a consequence of a stochastic 
process. 



6. Conclusions 



Doing evolutive optimization on RNA secondary-structure folding landscapes is 
somehow different to optimization on typical rugged fitness landscapes. There 
are no local optima in the naive sense, but rather extended labyrinths of con- 
nected equivalent sequences which somewhere touch or come close to labyrinths 
of better sequences [13, 29]. What looks like punctuated equilibria in one pro- 
jection (phenotype), presents itself as relentless and extensive change in another 
projection such as genetic makeup. Seen from this perspective the replicator 
concept that views genes as the sole unit of selection [2] may need an overhaul, 
since phenotypes are here to stay much longer than genes [1, 28]. 

Additional constraints at the sequence and the structure level may severely re- 
strict the extent of neutral networks. However, there is no doubt about the 
evolutionary impHcations, should it turn out that RNA structures capable of 
performing biochemically interesting tasks do form neutral networks in sequence 
space or can be accessed from such networks. Given present day in vitro evolu- 
tion techniques, these issues are within reach of experimental investigation. 
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